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ABSTRACT 



A Digital Video Quality (DVQ) apparatus and method that 
incorporate a model of human visual sensitivity to predict 
the visibility of artifacts. The DVQ method and apparatus 
are used for the evaluation of the visual quality of processed 
digital video sequences and for adaptively controlling the bit 
rate of the processed digital video sequences without com- 
promising the visual quality. The DVQ apparatus minimizes 
the required amount of memory and computation. The input 
to the DVQ apparatus is a pair of color image sequences: an 
original (R) non-compressed sequence, and a processed (T) 
sequence. Both sequences (R) and (T) are sampled, cropped, 
and subjected to color transformations. The sequences are 
then subjected to blocking and discrete cosine 
transformation, and the results are transformed to local 
contrast. The next step is a time filtering operation which 
implements the human sensitivity to different time frequen- 
cies. The results are converted to threshold units by dividing 
each discrete cosine transform coefficient by its respective 
visual threshold. At the next stage the two sequences are 
subtracted to produce an error sequence. The error sequence 
is subjected to a contrast masking operation, which also 
depends upon the reference sequence (R). The masked errors 
can be pooled in various ways to illustrate the perceptual 
error over various dimensions, and the pooled error can be 
converted to a visual quality measure. 

32 Claims, 6 Drawing Sheets 
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METHOD AND APPARATUS FOR quantization operations. Thresholding involves setting all 

EVALUATING THE VISUAL QUALITY OF coefficients whose magnitude is smaller than a threshold 

PROCESSED DIGITAL VIDEO SEQUENCES value equal to zero, whereas quantization involves scaling a 

coefficient by step size and rounding off to the nearest 

CROSS-REFERENCE TO RELATED 5 integer. 

APPLICATIONS Commonly, the quantization of each DCT coefficient is 

The present application claims the priority of co-pending determined by an entry in a quantization matrix. It is this 

provisional patent application Ser. No. 60/077,862, filed on matrix mat B P nmaril y responsible for the perceived image 

Mar. 13, 1998, which is incorporated herein in its entirety. W*^ 1 ? md tne blt rale of lhe transmission of the image. The 

10 perceived image quality is important because the human 

ORIGIN OF THE DISCLOSURE visual system can tolerate a certain amount of degradation of 

....... JL 1* an image without being alerted to a noticeable error. 

^e invention described herein was made by an employee n^fo^ certain ^ can be ^^ed at a bw bit 

of the National Aeronautics and Space Administration and it ratCj whereas other im cannot (olerate degradation and 

may be manufactured and used by and for the United States 35 should be transmitted at a higher bit rate in order to preserve 

Government for governmental purposes without the pay- their informational content. 

ment of royalties thereon or therefore. t*u<.»oi/: « * j* 1 iL jr iL • r 

J J ne 21o patent discloses a method for the compression of 

BACKGROUND OF THE INVENTION image information based on human visual sensitivity to 

quantization errors. In the method of '216 patent, there is a 

1. Field of the Invention 20 quantization characteristic associated with block to block 
The present invention relates to a method and apparatus components of an image. This quantization characteristic is 

for the evaluation of the visual quality of processed digital based on a busyness measurement of the image. The method 

video sequences. One common form of processing is com- °f '216 patent does not compute a complete quantization 

pression to reduce the bit-rate of digital video. The invention matrix, but rather a single scaler quantizer, 

can be used in various applications such as the automatic and 25 Recent years have seen the introduction and widespread 

continuous monitoring of processing of digital video acceptance of several varieties of digital video. These 

sequences for transmission as High Definition Television include digital television broadcasts from satellites (DBS- 

(HDTV) or Direct Broadcast System (DBS) TV. More TV), the US Advanced Television System (ATV), digital 

particularly, the present invention relates to a Digital Video movies on a compact disk (DVD), and digital video cassette 

Quality (DVQ) apparatus and method that incorporate a 30 recorders (DV). Such a trend is expected to continue in the 

model of human visual sensitivity to predict the visibility of near future and to expand to widespread terrestrial broadcast 

artifacts and the visual quality of processed video. and cable distribution of digital television systems. 

2. Description of Related Art Most of these systems depend upon lossy compression of 
Considerable research has been conducted in the field of 35 the . video stream. Lossy compression can introduce visible 

data compression, especially the compression of digital artifacts, and indeed there is an economic incentive to reduce 

images. Digital images comprise a rapidly growing segment bit rate to the point where artifacts are almost visible, 

of the digital information stored and communicated by Compounding the problem is the "bursty" nature of digital 

science, commerce, industry and government. Digital image video, which requires adaptive bit allocation based on visual 

transmission has gained significant importance in highly 40 quality metrics, and the economic need to reduce bit rate to 

advanced television systems, such as high definition televi- tne lowest level that yields acceptable quality, 

sion using digital information. Because a relatively large For m ^ reason, there is an urgent need for a reliable 

number of digital bits are required to represent digital means to automatically evaluate the visibility of compres- 

images, a difficult burden is placed on the infrastructure of s i° n artifacts, and more generally, the visual quality of 

the computer communication networks involved with the 45 processed digital video sequences. Such a means is essential 

creation, transmission and re-creation of digital images. For f° r tDe evaluation of codecs, for monitoring broadcast 

this reason, there is a need to compress digital images to a transmissions, and for ensuring the most efficient compres- 

smaller number of bits, by reducing redundancy and invis- s i° n of sources and utilization of communication band- 

ible image components of the images themselves. widths. 

A system that performs image compression is disclosed in 50 Th G following references that are incorporated herein by 

U.S. Pat. No. 5,121,216 of Chen et al. and is incorporated reference, describe visual quality metrics for evaluating, 

herein by reference. The '216 patent describes a transform controlling, and optimizing the quality of compressed still 

coding algorithm for a still image, wherein the image is images, and incorporate simplified models of human visual 

divided into small blocks of pixels. For example, each block sensitivity to spatial and chromatic visual signals: 

of pixels can be either an 8x8 or 16x16 block. Each block 55 A. B. Watson, "Image Data Compression Having Mini- 

of pixels undergoes a two dimensional transform to produce mum Perceptual Error," U.S. Pat. No. 5,629,780 

a two dimensional array of transform coefficients. For still (1997). 

image coding applications, a Discrete Cosine Transform A. B. Watson, G. Y. Yang, J. A. Solomon, and J. 

(DCT) is utilized to provide the transform. VuTasenor, "Visibility of Wavelet Quantization Noise," 

In addition to the '216 patent, the DCT is also employed 60 !EEE Transactions on Image Processing, 6(8), 

in a number of current and future international standards, 1164-1175 (1997). 

concerned with digital image compression, commonly A. B. Watson, "Perceptual Optimization of DCT Color 
referred to as JPEG and MPEG, which are acronyms for Quantization Matrices," IEEE International Confer- 
Joint Photographic Experts Group and Moving Pictures ence on Image Processing, 1, 100-104 (1994). 
Experts Group, respectively. After a block of pixels of the 65 A. B. Watson, "Image Data Compression Having Mini- 
mi 6 patent undergoes a DCT, the resulting transform coef- mum Perceptual Error," U.S. Pat. No. 5,426,512 
ficients are subject to compression by thresholding and (1995). 
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It would be desirable to extend the still image metrics 
described in the foregoing references to cover moving 
images. Most, if not all video quality metrics are inherently 
models of human vision. For example, if root-mean - 
squared-error (RMSE) is used as a quality metric, this 
amounts to the assumption that the human observer is 
sensitive to the summed squared deviations between refer- 
ence and test sequences, and is insensitive to aspects such as 
the spatial frequency of the deviations, their temporal 
frequency, or their color. The DVQ metric is an attempt to 
incorporate many aspects of human visual sensitivity in a 
simple image processing algorithm. Simplicity is an impor- 
tant goal, since one would like the metric to run in real-time 
and require only modest computational resources. 

A number of video quality metrics have been proposed in 
the following references: 

K. T. Tan, M. Ghanbari, and D. E. Pearson, "A Video 
Distortion Meter/' Picture Coding Symposium, 
119-122 (1997). 
T Hamada, S. Miyaji, and S, Matsumoto, "Picture Qual- 
ity Assessment System By Three-Layered Bottom-Up 
Noise Weighting Considering Human Visual 
Perception," Society of Motion Picture and Television 
Engineers, 179-192 (1997). 
C J. v. d. B. Lambrecht, "Color Moving Pictures Quality 
Metric," International Conference on Image 
Processing, I, 885-888 (1996). 
A. B. Watson, "Multidimensional Pyramids In Vision And 
Video " Representations of Vision: Trends and Tacit 
Assumptions in Vision Research, A. Gorea, 17-26, 
Cambridge University Press, Cambridge (1991). 
A. B. Watson, "Perceptual-Components Architecture For 
Digital Video," Journal of the Optical Society of 
America A, 7(10), 1943-1954 (1990). 
A. A. Webster, C. T. Jones, M. H. Pinson, S. D. Voran, and 
S. Wolf, "An Objective Video Quality Assessment 
System Based On Human Perception," Human Vision, 
Visual Processing, and Digital Display IV, SPIE 
Proceedings, 1913, 15-26 (1993). 
J. Lubin, "A Human Vision System Model for Objective 
Picture Quality Measurements," International Broad- 
casters' Convention, Conference Publication of the 
International Broadcasters' Convention, 498-503 
(1997). 

S. Wolf, M. H. Pinson, A. A. Webster, G. W, Cermak, and 
E. P. Tweedy, "Objective And Subjective Measures Of 
MPEG Video Quality," Society of Motion Picture and 
Television Engineers, 160-178 (1997). 
Some of the video quality metrics described in the fore- 
going references cover spatial filtering operations employed 
to implement the multiple, bandpass, spatial filters that are 
characteristic of human vision. A shortcoming of these video 
quality metrics is that that if the video quality metrics are not 
based closely enough upon human perception they can not 
accurately measure visual quality. Alternatively, if the video 
quality metrics are based closely upon human perception, 
they will require significant memory or computational 
resources that restrict the contexts in which they can be 
applied. 

Therefore, there is still an unsatisfied need for a quality 
metric for digital video, which is reasonably accurate but 
computationally efficient. 

SUMMARY OF THE INVENTION 

A feature of the present invention is to provide a Digital 
Video Quality (DVQ) apparatus and method that incorporate 
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a model of human visual sensitivity to predict the visibility 
of artifacts. The DVQ method and apparatus are used for the 
evaluation of the visual quality of processed or compressed 
digital video sequences, and for adaptively controlling the 

5 bit rate of the processed digital video sequences without 
compromising the visual quality. The DVQ apparatus mini- 
mizes the required amount of memory and computation. 

The inventive Digital Video Quality (DVQ) apparatus can 
be widely used in various commercial applications including 

io but not limited to satellite broadcasting of digital television 
(DBS -TV), movies on compact disc (DVD), high definition 
digital television (HDTV), digital video Camcorders (DV), 
Internet Video, digital terrestrial television broadcasting, and 
digital cable television distribution. 

15 The present DVQ method offers significant advantages 
over conventional metrics in that the present DVQ method 
incorporates a reasonably accurate human vision model into 
a relatively simple processing architecture. A contributor to 
such architectural simplicity is the use of discrete cosine 

20 transforms (DCT) as a spatial filter bank, since the hardware 
and software to implement the DCT are widely available, 
due to its prevalence in most existing standards for video 
compression. Indeed, in some applications of the present 
DVQ method, the DCT may have already been computed as 

25 part of the digital video compression process. 

Another contributor to the architectural simplicity of the 
present DVQ method is the use of Infinite Impulse Response 
(IIR) Filters in the temporal filtering stages. This reduces the 

3q amount of computation and memory required relative to 
other Finite Impulse Response (FIR) implementations. 

The foregoing and other features and advantages of the 
present invention are achieved by a new DVQ apparatus and 
method. The input to the DVQ apparatus is a pair of color 

35 image sequences: the reference (R) or original non- 
compressed sequence, and the test (T) or processed 
sequence. Both sequences (R) and (T) are sampled, cropped, 
and subjected to color transformations. The sequences are 
then subjected to blocking and DCT transformation, and the 

4Q results are transformed to local contrast. The next step is a 
time filtering operation which implements the human sen- 
sitivity to different time frequencies. The results are then 
converted to threshold units by dividing each DCT coeffi- 
cient by its respective visual threshold. At the next stage the 

45 two sequences are subtracted to produce an error sequence. 
The error sequence is then subjected to a contrast masking 
operation, which also depends upon the reference sequence 
(R). The masked errors can be pooled in various ways to 
illustrate the perceptual error over various dimensions, and 

5Q the pooled error can be converted to a visual quality (VQ) 
measure. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The features of the present invention and the manner of 
55 attaining them, will become apparent, and the invention 
itself will be understood by reference to the following 
description and the accompanying drawings, with similar 
numerals referring to similar or identical elements, wherein: 
FIG. 1 is a high level block diagram of video encoding 
60 system utilizing a DVQ apparatus according to the present 
invention; 

FIG. 2 is a functional block diagram of the DVQ appa- 
ratus of FIG. 1 made according to the present invention; 

FIG. 3 is a block diagram of a color transformer incor- 
65 porated in the DVQ apparatus of FIG. 2, and illustrating a 
color transformation process according to the present inven- 
tion; 
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FIG. 4 is a block diagram of a local contrast converter the reference sequence (R) outputted by the threshold scaler 

incorporated in the DVQ apparatus of FIG. 2, and illustrat- 44, to control the masking operation and to generate a 

ing the computational steps of the local contrast process masked error sequence (d^). 

according to the present invention; j^q mas ked error sequence (d 24 ) is pooled by a pooling 

FIG. 5 is a block diagram of a time filter incorporated in 5 processor 50, to combine the perceptual error over various 

the DVQ apparatus of FIG. 2, and illustrating an exemplary dimensions. The pooled error (E n ) can be converted to a 

temporal filtering process implemented by means of a sec- visual quality measure (Qq) by a visual quality converter 52, 

ond order IIR filter according to the present invention; and to provide an output in terms of quality rather than an error 

FIG. 6 is a block diagram of a contrast masking processor value, 

incorporated in the DVQ apparatus of FIG. 2, and illustrat- 30 Having provided an overview of the processing steps of 

ing the computational steps of the contrast masking proces- the DVQ apparatus 20, the DVQ method and apparatus 20 

sor according to the present invention. will now be described in greater detail with further reference 

m to FIGS. 3 through 6. 

DETAILED DESCRIPTION OF THE x t Sequences 

PREFERRED EMBODIMENTS is ^ in 4 put t0 me DVQ metric ± a paif of color video 

FIG. 1 illustrates a video encoding system 10 that incor- sequences (indexed by s). Each sequence includes an 

porates a DVQ apparatus 20 according to the present inven- ordered set of color images (indexed by i), and each color 

tion. In operation, a sequence of original (R) digital video is image includes a set of three images, one for each of three 

fed to the input of a video codec 25, and are processed color channels (indexed by c). Each image includes a set of 

thereby. The video codec 25 is a well known device for 20 rows (indexed by y), and each row includes a set of pixels 

coding and decoding video sequences. The sequence of (indexed by x). The first of the two sequences (s«l) is the 

original (R) video and the sequence of processed (T) video reference sequence (R), the second (s=2) is the test sequence 

generated by the codec 25 are fed to the DVQ apparatus 20 (D- Topically, the test sequence (f) differs from the refer- 

for quality evaluation. The resultant error or quality control ence sequence (R) in the presence of compression or other 

signal (E Q , Qq) can be fed back to the codec 25 for 25 artifacts. The input color space, indexed by c^, is defined in 

regulating the compression bit rate to correspond to the sufficient detail that it can be transformed into CIE 

desired image visual quality. coordinates, for example by specifying the gamma and 

FIG. 2 provides an overview of the processing steps of the chromaticity coordinates of each color channel. The input is 

DVQ apparatus 20. These steps will be described later in 3Q expressed as follows: 

greater detail. The input to the DVQ apparatus 20 is a pair dofota^^xj (l) 
of color video sequences: the original or reference (R) video 

sequence, and the processed or test (T) video sequence. Each ^ size of the tensions, i, y, and x depend upon the 

color video sequence includes three color channels which application. Also, since the DVQ metric can be computed 

can be, for example, the Y, Cb, and Cr channels used in „ continuously m a pipeHne fashion upon a continuous stream 

digital television. of video images, the dimension indexed by i might not have 

rpi . m\ * m *-n a nmte size* Associated with this input is a video image rate 

Hie two sequences (R) and (T) are spatially sampled by ( v d in Hert „ (Bz) Ji- h st5ecifies lh e tirne 

a sampler 30 to convert the three color channels to a ^ expressed in Hertz (J^j wnicn specifies the time 

common spatial resolution. The sampled sequences (d,) are * e< \ Uen ^ ? l ima ?f ^ th f. mpu f l ' and a ima S e r f 

processed by a region-of-interest &OI) processor 32 to 40 " SpeClfies * e tim f ^quency of images on the 

4 . * *u * * c • / . * . . , display. Also associated with this input are various other 

restrict the processing to a region of interest, or to weight i 4 . 4 . , y " . 

4l T rj, . r • * * display parameters, such as the color space, gamma, spatial 

some regions more than others. The region of interest i a -i* i* u« h c i_- u ?,u J F . 

z . v tU , 4 c * A resolution, and veiling light, all of which will be discussed 

sequences (dj are then processed by a color transformer 34 De i ow 

to convert the color channels to a perceptually relevant color Tj S T f C 1 Co ts 

S^^JKSZT'^^^ 45 ^^^^Sf^ the three color 

• * ui i / u i • 4 . . c channels such as Y, Cb, and Cr, are represented with 

into blocks, for processing by a discrete cosme transformer , t . , , ' ' „ ' . / ... 

fDCn 38 diflerent spatial resolutions. For example, the 4:2:2 variant 

^ r } A . e t „ of CC1R-601 standard digital video, is described in "Rec- 
The discrete cosine transform 38 converts each of these ommen dation ITU-R BT.601-5, Studio Encoding Param- 
blocks to a block of frequency (oi OCT) coefficients (d 10 ), 50 eters of Digital Television for Standard 4:3 and Wide Screen 
to allow subsequent frequency domain processing. Tne 1 6: 9 Aspect Ratios/' (1 995). Tne two color channels (Cb and 
resulting frequency coefficients (d 10 ) are then transformed to Cr) in that standard m represented by 360 pixels/line, while 
local contrast coefficients by a local contrast converter 40, in the i ummance chaQrje l (Y) is represented by 720 pixels/line, 
order to implement a light-adaptation process. The next step According to the present invention, these channels (Y, Cb, 
is a time filtering operation implemented by a time filter 42, 55 a nd Cr) are converted to a common resolution before color 
which implements the human sensitivity to different time CODVe rston. In this example, the two color channels (Cb, and 
frequencies. The resulting filtered components (d 18 ) are then Cr) m expanded horizontally by a factor of two through a 
converted to threshold units (d 19 ) by a threshold scaler 44, process of up . sampling (US) . Up-sampling is performed by 
to implement the human sensitivity to different spatial the sampler 30 shown in FIG. 2. Although various 
frequencies. The threshold scaler 44 divides each DCT 6 o up-sampling processes are possible, one example is pixel- 
coefficient by its respective visual threshold. replication, which is expressed by the following expression: 

At the next stage, the threshold units (d 19 ) corresponding 

to the (R) and (T) sequences are subtracted by a subtractor *i(Wi*v*>V$d v ts,i t c tm j? a xj\ (2) 

46, to obtain an error sequence (d^). The error sequence The up-sampling factors for each direction d (verticals, 

(cW is then subjected to a contrast masking operation by a 65 horizontal «2) and color channel c are specified by an array 

contrast masking processor 48. The contrast masking pro- us(c,d). In the example above, this array would be {{1,1}, 

cessor 48 receives the threshold units (d 19 ) corresponding to {1,2}, {1,2}}. 



05/14/2004, EAST Version: 1.4.1 



US 6,493,023 Bl 

7 8 

Region of Interest (ROI) Processing through multiplication by a matrix ^M yoz . In the follow- 

The sampled sequences (d^ can be spatially cropped by ing expression, the dot product is carried out over the index 

the ROI processor 32 (FIG. 2) to a ROI (d^). This confines c^: 
the computation of the D VQ apparatus 20 to that region, as 

expressed by the following equation: s ^(^CYoz^^-jraMyoz^siwxYz^) (8) 

^fatetoyjyRQWifatetoMx)] ( 3 ) ^ e trans f ormat i ons to XYZ and to YOZ can be concat- 
enated into a single matrix multiplication. 

It is also convenient to make this region an integer Although the operation of the color transformer 34 has 

multiple of 8x8 pixel blocks, or larger, if color down- been described in terms of specific color transformations 

sampling is used as described below. In an extension, (e.g. 100, 102, 104, 106), it should be understood that 

regions-not-of-interest within the region -of -interest may be alternate transformations can be used to arrive at a percep- 

excluded by means of a binary mask. The region of interest tual color space, 

can also be a continuously varying mask of weights, to De-Interlacing 

emphasize some regions more than others. If the input sequence (R or T) contains interlaced video 

Color Transformation Operation fields, then the index i specifies fields, and odd numbered 

The color transformation process shown in FIG. 3 is fields contain odd (or even) numbered video lines, and even 

implemented by the color transformer 34 (FIG. 2), and will fields contain even (or odd) video lines. In this case, the first 

now be described in detail. step includes converting the interlaced fields to a progressive 

Transformation to RG'B' Color Channels ^ sequence (d 7 ) by means of a de-interlacer 110. The 

The ROI sequences (cy are transformed from their native de-interlacing process can be implemented, for example, by 

color space c^ to, for example, gamma-corrected color one of three methods, depending upon the system require- 

channels R\ G', and B f by a R'G'B' transformer 100. For ments. Each of these three de-interlacing methods will now 

example, if c in corresponds to the YCbCr color space of be described in detail. 

CCIR-601 standard digital video, then the color channels R', 1. De-interlacing by Inserting Blank lines. 

G', and B' are expressed by the following equation: 25 In this method, each field is converted to an image by 

inserting blank lines into even numbered lines in odd (or 

(4) even) fields, and odd numbered lines in even (or odd) fields. 
This method doubles the total number of pixels to be 
processed. The advantage of this method is that it correctly 
represents the spatial and time relationship of all video lines. 
In this method, the display image rate is specified as equal 

The resulting color transformed sequences (d 3 ) can be to video image rate (w d ~wj. 

expressed by the following equation: 2 - De-interlacing by Treating One Field as One Image. 

35 In this method, each field is treated as an image. This 

dsfaitWxyR'G'B'i^isAc^xy] (5) method is more efficient than the method of inserting blank 

^ - ^ , ^, , lines, since the number of pixels is not increased. However, 

£ °^° n l ? L ° f Channels this method does not completely accurately represent the 

The RGB color channels are converted to RGB color spatial relationship of lines in odd and even fields. For 

t ^T?K by * R , GB tr u anS ^ 10 ? ^ co ? v ™ * An example, the first fines of odd and even fields are treated as 

effected by dividing the R'G'B' color channels by 255, *o SU pcrimposed rather than oflket by one line. In this method, 

clipping to the range [0,1], and raising the result to an the ^ k { ^ k med ^ ^ tQ ^ y{dcQ 

exponent y. The clipping may be necessary because the ^ rate / w * x 

rangeof ^ values, combined with me interpolation process, 3> De -i n terlacing by Treating Two Fields as One Image, 

can produce excursions outside the permitted range. The In this method) each ^ * f odd and eyen fieldg ^ 

resulting color transformed sequences (d 4 ) can be expressed * combined ^to one image; the odd field combes th e odd 

by the following equation: (or even) lineSj and the eyen Md the even (or 

d 4 (s,i,c,y f x)={\[d 2 (s,i 1 c,y ) x)/2S5\ 0 ] l y < (6) oc ^) ^ ncs - This method is as efficient as the method above 

of treating each field as an image, since the number of pixels 

Transformation to XYZ Coordinates 5Q is not increased (the number of images is halved, but the 

The RGB color channels (d 4 ) are then converted to the number of lines/image is doubled). However, this method 

standard CIE color coordinates XYZ (d 5 ) by a XYZ trans- does not completely correctly represent the temporal rela- 

former 104. This is accomplished through multiplication by tionship of lines in odd and even fields. For example, the odd 

a user-supplied matrix RG b^xyz mat describes the simulated and even fields are treated as occurring in the same field 

display. In the following expression, the dot product is 55 rather than ofiset by one field time. In this method, the 

carried out over the index C RGB , display image rate is half the video image rate (w ^v^fJT), 

The application of the de-interlace operation can be 

dsisMxYz&xyRGBMxYzd&AcRBcM*) CO expressed by the following equation: 







Y 


a 




cb-m 


& 




Cr- 128 



30 



^W^-Dfld^c,)',*)] (9) 



Transformation to YOZ Coordinates 

The XYZ color coordinates (d 5 ) are converted to color 60 

coordinates YOZ (d 6 ) by a YOZ transformer 106. This It should also be noted that this operation can change the size 

transformation is described in H. Peterson, A. J. Ahumada, of dimensions i or y, depending on which method is selected. 

Jr. and A. Watson, "An Improved Detection Model for DCT If the input is progressive video, then the de-interlace 

Coefficient Quantization," SPIE Proceedings, 1913, operation is omitted. 

191-201 (1993), which is incorporated by reference, in 65 Veiling Light 

modeling perceptual errors in still image compression. In the The next step is the addition of a veiling light to both 

present invention, the transformation is accomplished processed sequences (R) and (T) by a veiling light combiner 



05/14/2004, EAST Version: 1.4.1 



US 6,493,023 Bl 
9 10 

112. This veiling light represents the ambient light reflected DC coefficients (d 12 ) are then extracted from all the 
off the display toward an observer, and is specified by a blocks (d n ) by a DC extractor (DC) 202, as expressed by the 
vector of three numbers v, the CIE XYZ coordinates of the following equation: 
veiling light. To add this veiling light to the sequence, it is 

first converted to YOZ coordinates, as specified above, and 5 d l2 (3 t i t c t b^tx)~d ll (s t i t c,fy t bx,0fi) (14) 

as expressed by the following equation: 

The DC coefficients (d 12 ) are then time filtered by a time 
d^^x^isMy^xY^Aya^ (io) filter (TF) 204, using a first-order, low-pass, IIR filter with 

a gain of 1, for generating filtered coefficients (d 13 ), as 
where the result vector is understood to be added to each 3Q expressed by the following equation: 
color pixel. 

Down-Sampling of Color Components ^i3(^Ac,^,te)Wj 1 (i 12 Cj ) 4^ty,to)+oitfi3(^t-i.c f fcHfcwr) ) (15) 

Since visual acuity for color signals is much lower than 
that for luminance, it is often possible to tower the resolution where b > and ^ are filter P ar ameters - If desired, these filter 
of the two color channels O and Z. To achieve this, the color parameters b A and ^ can also be made into arrays, depen- 
channels O and Z are down-sampled by factors of ds(c,d), 15 dent u P° n c > v ' u ' 

where c is color (Y,0, or Z), and d is direction (vertical or Sincc thc ima S c rate of the dl S ltal vldco can varv from 
horizontal), by means of a down sampler 114. This down- application to application, it is necessary to define the filter 
sampling process can be accomplished by any number of parameters in a way that is independent of the image-rate, 
well known or available filtering and sampling procedures, For a fiist order lt >w-pass IIR filter with unit DC gain this can 
such as block-averaging. The color down-sampling step can 20 be done b y s P eci fying a time constant x A in seconds and a 
be expressed by the following equation: display image-rate w d in Hz. The filter parameters a., and b a 

can then be expressed by the following equations: 

(16) 



<ttow)-DS[<Ws,* W )] (11) 



Blocked DCT 25 ^ = e 1 d 

Referring back to FIG. 2, the color transformed sequences 
(dg), each image in each color channel is divided into 8x8 

pixel blocks by the block constructor 36, and a DCT is b x -i-a x (17) 
applied to each block by the DCT transformer 38. This 

operation is referred to as blocked DCT(BDCT). The input 30 Thereafter, a data structure (d 14 ) is created in which the 

will typically have been cropped to an integer number of elements of the filtered coefficients (d 13 ) corresponding to 

blocks horizontally and vertically. The dimensions of the color channel O are discarded and replaced by filtered 

result are {s, i, c, by, bx, v, u}, where by and bx are the coefficients (d 33 ) corresponding to color channel Y, using a 

number of blocks in vertical and horizontal directions, YYZ channel exchanger 206, as expressed by the following 

respectively, and where v and u are the DCT frequencies that 35 equation: 
are integers between 0 and 7. The BDCT operation is 

expressed by the following equation: tiMWOZh tybxyd^sMXXZhtybx) (is) 

rf 10 (A4c,fry^ v .")=BDcr[d 9 (5,4c 1 ^jc)] (12) If desired, the channel exchanger 206 can also substitute 

40 the Y color channel coefficients for the Z color channel 

Local Contrast coefficients as well. The adjusted DCT coefficients (d ia ) are 

FIG. 4 represents a functional block diagram of an exem- divided by the filtered DC coefficients (d 14 ) on a block-by- 

plary local contrast converter 40. The local contrast con- block basis, as expressed by the following equation: 
verter 40 converts the DOT coefficients (d 30 ) to units of local 

contrast (d 17 ). First, the DOT coefficients (d 10 ) are adjusted 45 dn(s y i t c, by, bx, v, u) (19) 

by the relative magnitudes of their coefficients correspond- dna{St '* c> ^ bx > v ' u) = d I4 (j. ;. c, by, \x] 
ing to a unit contrast basis function (A) 200, as illustrated in 
the following Table 1, and as expressed by the following 

equation: The DC coefficients (d 12 ) are converted in a similar 

fashion. First, a mean DC coefficient (d 15 ) is computed over 

rfii(w,^^M«M(v.«)rfio(^<;ty,^«) (13) the entire image by an averager 210, as follows: 

TABLE 1 



Relative magnitudes of unit contrast DCT Basis Functions, A (v, u) 
u-0 12 3 4 5 6 



-0 


1. 


138704 


130656 


1.38704 


1. 


1.38704 


1.30656 


1.38704 


1 


1.38704 


1.92388 


1.81225 


1.92388 


138704 


1.92388 


1.81225 


1.92388 


2 


1.30656 


1.81225 


1.7071 


1.81225 


130656 


1.81225 


1.7071 


1.81225 


3 


138704 


1.92388 


1.81225 


1.92388 


138704 


1.92388 


1.81225 


1.92388 


4 


1. 


138704 


130656 


138704 


1. 


1.38704 


1.30656 


1.38704 


5 


138704 


1.92388 


1.81225 


1.92388 


138704 


1.92388 


1.81225 


1.92388 


6 


130656 


1.81225 


1.7071 


1.81225 


130656 


1.81225 


1.7071 


1.81225 


7 


138704 


1.92388 


1.81225 


1.92388 


138704 


1.92388 


1.81225 


1.92388 
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duU. c) = JJ-rrZ Z d,i(s - c> bx) 

*> ** »> to 



(20) 



0 2 ,1 -2*"^ cos 



(24) 



The mean filtered DC coefficients (d 16 ) are computed by 
another averager 212, as follows: 



a 2 .2 = -e W 



(25) 



It should be understood that filters of higher or lower 
order can alternatively be used in the present invention. 

The difference between the DC coefficients (d 12 ) and their DC ? Th^dds 

™ /j \ * j * • j j l *u j no rc • * , Next, a set of contrast thresholds Tfc.v.u) is computed for 

mean (d 15 ) is divided by the mean filtered DC coefficients 15 . : - ^ \ / 7 ■ ~/ , 

tA \ • ■ *u v -c +u j • . r each color and DCT frequency. These thresholds T(c,v,u) are 

(d 16 ), agam using the Y component for the denominator of meproductof a summation factor s> and three function^, one 

the 0 component, to generate the DC contrast coefficients of mc color componcnt c , one of mc orientation of the DCT 

(di7i)> as follows: frequency and independent of color, and one a Gaussian 

function of DCT radial frequency whose parameters depend 

J, ,„(--, r, r, fr, fee, 0, 0) = di2is ' L c ' ^ bXt °' 0) " f ' c) (22) U00n 00101 and ^rther upon the horizontal and vertical 

d ^ c > processing resolutions pr(c,d). The processing resolutions 

pr(c,d) are expressed by the following equation: 

The insertion of the processed DC contrast coefficients vr(c ^ ^ d) (2?J 

( d i7fc) is labeled DC" 1 . These operations convert each DCT 25 p* c * rf > = — ^ Ct d) ' 
coefficient (d 10 ) to a number that expresses the amplitude of 

the corresponding basis function as a fraction of the time- . , JX . , „ , , „ , . 

filtered average luminance in the corresponding block. For Wh f 6 f Vr < C ' d > ' S the of ""^o- "¥« video, in 

•u nr- « ■ . /j \ •. .u • j « r umts of pixels/degree of visual angle, and usTc.d) and dsfad) 

he DC coefficients(d 12 ), it expresses their difference from 30 are ^ and do ^ n . sampling ^ ( earlier ^ 

the mean DC coefficient (d 13 ) as a fract l0 n of the mean thresholds T(c,v,u) are expressed by the following equa- 

filtered DC coefficients (d 16 ). tions: 

The final local contrast signal (d 17 ) is composed of AC v _ , v 

coefficients (d 17fl ) combined with DC coefficients (d 17 ,), by ™ M) = S ^^^^ W 

means of a DC insertion process (DC 1 ) 208. 35 { n x nQ , 

Temporal Filtering T^.B^i^it^lfi) ™ 

With further reference to FIG. 5, the local contrast signals §zi (30) 

(d 17 ) are subjected to temporal filtering. In a preferred t 2 (u, v) = — 2 ^ 2 ^ 

embodiment, the time filter 42 is a second-order IIR filter. 40 l - [ U z2 

Parameters of the time filter 42 are estimated from calibra- + 

tion data. The filtered coefficients (d 18 ) resulting from the Tl(0 < °) = */ vT 

time filter 42 can be expressed as follows: T ^ u - °) = 1 

r 2 (o, v) = i 

45 

di&(s, c, by, bx, v t u) = ^(c, v, u) dn(s, i, c, by, bx, v, u) +■ (23) 

2 

a 2J t(c, v, h) dud, i - A, c, by, bx, v, «) 



7 3 ( C)= [ pr(ca) M Y (31> 

I cr(c, 1) cr{c, 2) J 



50 In the latter equation (31), cr(c,d) represent the calibration 

... c a w „ . 4 ^ resolutions for which the parameters T 0 (c), f c , r, are speci- 

where b 2 and ae arrays of filter coefficients These fied . ^ m typicaUy the resolutions ° a \ ^calibration 

arrays b 2 and a 2Jt allow different temporal filtering for each data were collected 

DCT frequency and each color. For simplicity, these arrays The processed coefficients (d 18 ) are converted by a thresh- 

b 2 and can be made constant, independent of c,v,u, or 55 old scaler 44 to threshold units (d 19 ), by dividing (d 18 ) by 

can be made to depend only on c, or on both c and a simple their respective spatial thresholds T, as follows: 
function of v,u. 

Since the image rate of the digital video can vary from d^s t ;, c, by, bx, v, u) = dls(j ' f '' c ' by * bXt v ' u) (32) 

application to application, it is necessary to define the filter r(c * v ' M) 
arrays b 2 and a 2 ^ in a way that is independent of image-rate. 60 

The present method specifies the time filter 42 in terms of a Subtraction of Test and Reference 

center frequency w c and a tuning factor q. If the time filter After conversion to threshold units (d 19 ), the units corre- 

42 were constrained to have a magnitude of one at the center sponding to the two sequences (R) and (T), are subtracted by 

frequency w c , and if the display image rate were w d , then the 6 5 ! * ubtractor 46 t0 P roduoe an error sequence (dj, as 

filter arrays b 2 and a2^ are expressed by the following tollows: 

equations: ^ 0 ac,/ w ^v ( aW 19 (2^c;^^Ktt)-c/ 19 (Ucity,^ 1 («) (33) 
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Contrast Masking 

With further reference to FIG. 6, contrast masking is 
accomplished by first constructing a masking sequence 
(d^). The threshold units (d 19 ) corresponding to the refer- 
ence sequence (R) are rectified by a rectifier 215, as follows: 



dz&c, by, £a,vju)=|di S> Cl^<: ) frv,fcj;v ) u)| 



(34) 



and are then time-filtered by a first-order, low-pass, discrete 
IIR filter 217, with parameters a 3 and b 3 to generate a filtered 
masking sequence (d 22 ). Parameter b 3 can be derived from 
a contrast masking gain g, a time constant t 2 and the display 
image-rate w^, as follows: 



^{i, c, by, bx, v, «)=Max[ Id^ft c, by, bx, v, u)\ 



d 2 4{i, c, by, bx, v, u) = 



<*2o(*\ c, by, bx, v, «) 
c, by, bx, v, u) 



(37) 



(38) 



This process resembles the traditional contrast masking 
result in which contrasts below threshold have no masking 
effect, and for contrasts above threshold the effect rises as 



10 



(35) 



In an alternative embodiment, both ^ and g can be 
functions of c,v,u. The filtered sequence (d 22 ) is then 
obtained by the following expression: 

d 22 (hc > by hx t ^)mb i d zl (i l c t b^bx l sm)+o j ^ 2 (i-1^ by,bx, y u) (36) 

In an alternative embodiment, the values of the filtered 
masking sequence (d 22 ) can be blurred, within each block to 
implement a form of cross-channel masking, as explained in 
A. B. Watson and J. A. Solomon, "A Model of Visual 
Contrast Gain Control and Pattern Masking," Journal of the 
Optical Society A, 14, 2378-2390 (1997) which is incorpo- 
rated herein by reference. 

The values of the filtered masking sequence (d 22 ) are then 
raised to a power m by an expotentiator 219, wherein any 
values less than 1 are replaced by 1. The resulting values 
(d 23 ) are used to divide the difference sequence (d^), as 
expressed below for generating a masked error sequence 



25 



14 



the mth power of mask contrast in the threshold units (d 19 ), 
as explained in G. E. Legge and J. M. Foley, "Contrast 
Masking in Human Vision," Journal of the Optical Society 
of America, 70(12), 1458-1471 (1980), which is incorpo- 
rated herein by reference. 
Error Pooling 

Referring back to FIG. 2, the dimensions of the resulting 
sequence (d 24 ) are {i, c, by, bx, v, u}, where, i is images, c 
is color channels, by and bx are the number of blocks in 
vertical and horizontal directions, and v,u are the vertical and 
horizontal DCT frequencies. These elementary errors can 
then be combined over a subset of dimensions Q, or all 
dimensions, to yield summary measures of visual error 
distributed over the complementary dimensions £2. In a 
preferred embodiment, this summation is implemented 
using a Minkowski metric as follows: 



£ n (H) 



20 



■(Zi 



(39) 



c, by, bx, v, u)f\ flQ{i, c, by, bx, v, u) 



Different applications can require summation over differ- 
ent subsets of dimensions. For example, summation over all 
dimensions except i would provide a continuous time record 
of overall quality, while pooling over all dimensions except 
u and v (over some number of images) would indicate visual 
error as junction of DCT frequency. 
Output 

The output of the DVQ apparatus 20 can be either the 
perceptual error (E^ or a quality measure (Qq) outputted by 
the visual quality converter 52. The quality measure (Q n ) 
can be computed as follows: 



l + £n 



(40) 



This quality measure (Q n ) has a maximum value of 2, 
which is reached when the perceptual error (E Q ) is zero, and 
40 has a value of 1 when the perceptual error (E^) is at 
threshold (a value of 1). Other monotonic transforms of the 
perceptual error (E^) can alternatively be employed. 
Exemplary Parameters 

The following Table 2 provides exemplary parameters 
45 used in the DVQ apparatus 20. Alternative embodiments can 
use different values of these parameters. 



TABLE 2 



Parameters of the DVQ Apparatus 



Parameter 


Example value 




Definition 


Unit 


Equ. 


YCba M R'G'B* 


1. 
1. 
1. 


-0.002463 
-0.33356 
1.73185 


1.36558 
-0.699821 
-0.006097 


Color transform matrix 




4 




40.85 
23.20 
2.049 


32.13 
67.62 
12.20 


18.95 
7.90 
104.75 


Color transform matrix 




7 


xysMyoz 


0 

0.47 
0 


1 

-0.37 
0 


0 

-0.1 
1 


Color transform matrix 




8 


T„(c) 


{1/83.19, 1/231.09, 1/27.7} 


Global thresholds {Y, O, Z} 




28 


V 


{1, 1, 1} 






Veiling Light 


cie xyz 


10 


W v 


60 






Video image rate 


Hz 




w d 


60 






Display image rate 


Hz 


24 


f(c) 


{19.38, 4.85, 4.85} 




Spatial corner frequency, {Y, O, Z} 


cycles/degree 


29 


P 


4 






Pooling exponent 




30, 39 


t 


0.167 






Oblique effect parameter 




30 
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TABLE 2-continued 



Parameters of the DVQ Apparatus 



Parameter 


Example value 


Definition 


Unit 


Equ. 


vr (c, d) 


{{32, 32], {32, 16}, {32, 16}} 


Video resolution 


pixels/degree 


27 


us (c, d) 


{{1, 1},{1, 2}, {1,2}} 


Up-sampling factors 




27 


ds (c, d) 


{(1,1},{2,2}, {2, 2}} 


Down-sampling factors 




27 


cr (c, d) 


{{32, 32}, {16, 16}, {16, 16}} 


Calibration resolutions 


pixels/degree 


31 


*i 


0.04 


Light adaptation time constant 


seconds 


16 


*1 


0.04 


Contrast masking time constant 


seconds 


35 


8 


3 


Contrast masking gain 




35 


m 


0.9 


Contrast masting exponent 




37 


S 


3.7 


Summation factor 




28 


q(c) 


{1.3, 1.3, 1.3} 


Temporal filter Q factor, {Y, O, Z} 




24 


w c(c) 


{7.31, 7.31, 7.31} 


Temporal filter center frequency 


Hz 


24 



It should be clear that alternative embodiments of the 
DVQ apparatus 20 are possible within the scope of the 
present invention. In one embodiment, and if additional 
processing speed were desirable, and if the input were in an 
appropriate format such as YcbCr color space, the color 
transforms and gamma conversion can be omitted. 

In an alternative embodiment, and if additional processing 
speed were desirable, and if the input were in an appropriate 
format such as the blocked DCT of YCbCr color channels, 
then the DCT transform, as well as the color transforms and 
gamma conversion can be omitted. 

In another alternative embodiment, the subtraction of the 
test (T) and reference (R) sequences by the subtracter 46 can 
be postponed until after the contrast masking is implemented 
by the contrast masking processor 48. The contrast masking 
process can be combined with an alternate masking formu- 
lation in which each DCT coefficient is divided by a 
rectified, filtered set of neighboring coefiBcients, with an 
small added constant. 

In still another embodiment, rather than using a single set 
of time filter coefiBcients (b 2 , ly a^ 2), a matrix with one 
entry for each color and DCT frequency can be used. This 
does not substantially increase the computational 
complexity, but improves the accuracy of the temporal 
filtering model. 

One skilled in the art will appreciate that the present 
invention can be practiced by other than the described 
embodiments or values, which are presented for purposes of 
illustration and not of limitation. For example, while the 
DVQ apparatus and method are described in terms of 
discrete components, it should be clear that the function of 
these components can be implemented by means of a 
software program. 

What is claimed is: 

1. A digital video quality method for evaluating the visual 
quality of a processed (T) video sequence relative to an 
original (R) video sequence, the method comprising: 
sampling the original and processed video sequences to 

generate sampled sequences (d 3 ) therefrom; 
Limiting the processing of said sampled sequences (d 3 ) to 

a region of interest and generating region of interest 

sequences (d^) therefrom; 
transforming said region of interest sequences (d 2 ) to 

local contrast coefficients (d 17 ); 
filtering said local contrast coefiBcients (d 37 ) to generate 

filtered components (d 18 ) therefrom; 
converting said filtered components (d 18 ) to threshold 

units (d 19 ); 

subtracting said threshold units (d 39 ) corresponding to the 
original (R) and processed (T) sequences to obtain an 
error sequence (d^o); 



subjecting said error sequence (d 20 ) to a contrast masking 
operation to generate a masked error sequence (d 24 ) 
20 therefrom; and 

pooling said masked error sequence (d^) to generate a 
perceptual error (E^). 

2. A method according to claim 1, further including 
converting said perceptual error (E^) to a visual quality 

25 measure (Q^), to provide an output in terms of quality. 

3. A method according to claim 1, further including 
feeding back said perceptual error (E^) to a codec, for 
regulating a compression bit rate to correspond to a desired 
image visual quality. 

30 4. A method according to claim 2, further including 
feeding back said visual quality measure (Q^) to a codec, for 
regulating a compression bit rate to correspond to a desired 
image visual quality. 

5. A method according to claim 1, wherein each of said 
processed (T) video sequence and said original (R) video 
35 sequence includes color channels; and 

wherein said color channels are converted by a color 
transformer to a perceptually relevant color space, to 
generate color transformed sequences (d 9 ) from said 
region of interest sequences (d 2 ). 
4 ° 6. A method according to claim 5, further including 
subjecting said color transformed sequences (dp) to blocking 
to generates blocks. 

7. A method according to claim 6, further including 
converting said blocks to a block of frequency coefficients 

45 (d 10 ) by means of a discrete cosine transformer. 

8. A method according to claim 7, wherein said block of 
frequency coefficients (d 10 ) are converted to a local contrast 
signal (d 17 ) by means of a local contrast converter; and 

wherein said local contrast signal (d 17 ) includes a com- 
50 bination of AC coefficients (d 37a ) and DC coefficients 
(d„ 6 )- 

9. A method according to claim 8, wherein contrast 
masking is accomplished by rectifying said threshold units 

55 10. A method according to claim 5, wherein said region of 
interest sequences (dLj) are transformed from their native 
color space to gamma-corrected color channels R\ G', and B' 
by a R'CB' transformer. 

11. A method according to claim 10, further including 
60 converting said color channels R\ G\ and B' to RGB color 

channels by a RGB transformer. 

12. A method according to claim 11, further including 
converting said RGB color channels to XYZ color coordi- 
nates by a XYZ transformer. 

65 13. A method according to claim 12, further including 
converting said XYZ color coordinates to YOZ color coor- 
dinates by a YOZ transformer. 
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14. A method according to claim 13, wherein if any of the 
processed (T) video sequence or the original (R) video 
sequence contains interlaced video fields, then 
de-interlacing said interlaced fields to a progressive 
sequence (d 7 ) by means of a de-interlace r. 5 

15. A method according to claim 14, wherein 
de-interlacing is implemented by inserting blank lines into 
even numbered lines in odd fields, and odd numbered lines 
in even fields, 

16. A method according to claim 14, wherein 30 
de- interlacing is implemented by inserting blank lines into 
even numbered lines in even fields, and odd numbered lines 

in odd fields. 

17. A method according to claim 14, wherein 35 
de-interlacing is implemented by each pair of odd and even 
video fields as an image. 

18. A method according to claim 14, further including 
adding a veiling light to said progressive sequence (d 7 ) by 
means of a veiling light combiner. 20 

19. A method according to claim 1, wherein sampling 
includes pixel-replication. 

20. A digital video quality apparatus with an original (R) 
video sequence and a processed (T) video sequence being 
fed thereto, the apparatus comprising: 25 

a sampler for sampling the original and processed video 
sequences to generate sampled sequences (d a ) there- 
from; 

a region -of-interest processor for limiting the processing 30 
of said sampled sequences (dj to a region of interest 
and for generating region of interest sequences (d^) 
therefrom; 

a local contrast converter for transforming said region of 
interest sequences (dj) to local contrast coefficients 35 

(d 17 ); 

a time filter for filtering said local contrast coefficients 
(d 17 ) and for generating filtered components (d 18 ) 
therefrom; 

a threshold scaler for converting said filtered components 
(d 18 ) to threshold units (d 19 ); 

a subtractor for subtracting said threshold units (d 19 ) 
corresponding to the original (R) and processed (T) 
sequences to obtain an error sequence (d^); 45 

a contrast masking processor for subjecting said error 
sequence (d^) to a contrast masking operation and for 
generating a masked error sequence (d^) therefrom; 

aDd . 50 
a pooling processor for pooling said masked error 
sequence (d 24 ) to generate a perceptual error (E n ). 

21. An apparatus according to claim 20, further including 
a visual quality converter that converts said perceptual error 
(E n ) to a visual quality measure (Q^), for providing an 55 
output in terms of quality. 

22. An apparatus according to claim 21, further including 
a codec to which said perceptual error (E^) is fed back for 
regulating a compression bit rate to correspond to a desired 
image visual quality. 60 

23. An apparatus according to claim 21, further including 
a codec to which said visual quality measure (Q^) is fed 
back for regulating a compression bit rate to correspond to 
a desired image visual quality. 

24. An apparatus according to claim 20, wherein each of 65 
said processed (T) video sequence and said original (R) 
video sequence includes color channels; and 
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a color transformer that converts said color channels to a 
perceptually relevant color space, for generating color 
transformed sequences (dg) from said region of interest 
sequences (d 2 ). 

25. An apparatus according to claim 24, further including 
a block constructor that subjects said color transformed 
sequences (dg) to blocking, in order to generate blocks. 

26. An apparatus according to claim 25, further including 
a discrete cosine transformer for converting said blocks to a 
block of frequency coefficients (d 10 ). 

27. An apparatus according to claim 26, wherein if any of 
the processed (T) video sequence or the original (R) video 
sequence contains interlaced video fields, then 
de-interlacing said interlaced fields to a progressive 
sequence (d 7 ) by means of a de-interlacer. 

28. An apparatus according to claim 27, further including 
a veiling light combiner for adding a veiling light to said 
progressive sequence (0%). 

29. An apparatus according to claim 28, further including 
a local contrast converter for converting said block of 
frequency coefficients (d 10 ) to a local contrast signal (d 17 ); 
and 

wherein said local contrast signal (d 17 ) includes a com- 
bination of AC coefficients (d 17a ) and DC coefficients 

30. A digital video quality apparatus with original (R) 
video sequence and a processed (T) video signal being fed 
thereto, the apparatus comprising: 

a sampler for sampling the original and processed video 
sequences and for generating sampled sequences (d a ) 
therefrom; 

a region-of-interest processor for limiting the processing 
of said sampled sequences (dj to a region of interest 
and for generating region of interest sequences (d 2 ) 
therefrom; 

a local contrast converter for transforming said region of 
interest sequences (cy to local contrast coefficients 

(d J7 ); 

a time filter for filtering said local contrast coefficients 
(d J7 ) and for generating filtered components (d 38 ) 
therefrom; 

a threshold scaler for converting said said filtered com- 
ponents (d 18 ) to threshold units (d 19 ); 

a subtractor for subtracting said threshold units (d 19 ) 
corresponding to the original (R) and processed (T) 
sequences to obtain an error sequence (d 20 ); 

a contrast masking processor for subjecting said sub- 
tracted error sequence (d 20 ) to a contrast masking 
operation and generating a masked error sequence 
(d^); and 

a pooling processor for pooling said error sequence (d 20 ) 
to generate a perceptual error (E n ). 

31. An apparatus according to claim 30 further including 
a visual quality convenor for converting said perceptual 
error (Eq) to visual quality measure (Q^) to provide output 
in terms of quality. 

32. A digital quality method for evaluating the visual 
quality of a processed (T) video sequence relative to an 
original (R) video sequence, the method comprising: 

sampling the original and processed video sequences and 
for generating sampled sequences (dj) therefrom; 

limiting the processing of said sampled sequences (dj to 
a region of interest and for generating region of interest 
sequences (d 2 ) therefrom; 
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transferring said region of interest sequences (d 2 ) to local 

contrast coefficients (d 17 ); 
filtering said local contrast coefficients (d 17 ) and for 

generating components (d 18 ) therefrom; 
subtracting said threshold units (d 19 ) to obtain an error 

sequence (d^); 
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subjecting said error sequence (d 20 ) to a contrast masking 
operation to obtain a masked error sequence (d 24 ); and 

pooling said error sequence (d^) to generate a perceptual 
(EJ. 
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